Structured Co-reference Graph Attention for Video-grounded Dialogue
نویسندگان
چکیده
A video-grounded dialogue system referred to as the Structured Co-reference Graph Attention (SCGA) is presented for decoding answer sequence a question regarding given video while keeping track of context. Although recent efforts have made great strides in improving quality response, performance still far from satisfactory. The two main challenging issues are follows: (1) how deduce co-reference among multiple modalities and (2) reason on rich underlying semantic structure with complex spatial temporal dynamics. To this end, SCGA based Resolver that performs dereferencing via building structured graph over modalities, Spatio-temporal Video Reasoner captures local-to-global dynamics gradually neighboring attention. makes use pointer network dynamically replicate parts sequence. validity proposed demonstrated AVSD@DSTC7 AVSD@DSTC8 datasets, benchmarks, TVQA dataset, large-scale videoQA benchmark. Our empirical results show outperforms other state-of-the-art systems both extensive ablation study qualitative analysis reveal gain improved interpretability.
منابع مشابه
Spatio-Temporal Attention Models for Grounded Video Captioning
Automatic video captioning is challenging due to the complex interactions in dynamic real scenes. A comprehensive system would ultimately localize and track the objects, actions and interactions present in a video and generate a description that relies on temporal localization in order to ground the visual concepts. However, most existing automatic video captioning systems map from raw video da...
متن کاملA Structured Distributional Semantic Model for Event Co-reference
In this paper we present a novel approach to modelling distributional semantics that represents meaning as distributions over relations in syntactic neighborhoods. We argue that our model approximates meaning in compositional configurations more effectively than standard distributional vectors or bag-of-words models. We test our hypothesis on the problem of judging event coreferentiality, which...
متن کاملGrounded Semantics as Persuasion Dialogue
In the current work, we provide a formal Mackenzie-style persuasion dialogue for grounded semantics. We show that an argument is in the grounded extension iff the proponent is able to persuade a maximally sceptical opponent in the dialogue.
متن کاملGrounded Objects and Interactions for Video Captioning
We address the problem of video captioning by grounding language generation on object interactions in the video. Existing work mostly focuses on overall scene understanding with often limited or no emphasis on object interactions to address the problem of video understanding. In this paper, we propose SINet-Caption that learns to generate captions grounded over higher-order interactions between...
متن کاملFacial Expression Grounded Conversational Dialogue Generation
We present a novel conversational language model that is grounded with information about facial expressions. To our knowledge this is the first in-depth examination of grounding natural language models with facial cues. We train a neural language model that uses automatically detected facial action unit intensity information in images alongside text to generate conversational dialogue. We evalu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2021
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v35i2.16273